Discovering hierarchical speech features using convolutional non-negative matrix factorization
نویسنده
چکیده
Discovering a representation that reflects the structure of a dataset is a first step for many inference and learning methods. This paper aims at finding a hierarchy of localized speech features that can be interpreted as parts. Non-negative matrix factorization (NMF) has been proposed recently for the discovery of parts-based localized additive representations. Here, I propose a variant of this method, convolutional NMF, that enforces a particular local connectivity with shared weights. Analysis starts from a spectrogram. The hidden representations produced by convolutional NMF are input to the same analysis method at the next higher level. Repeated application of convolutional NMF yields a sequence of increasingly abstract representations. These speech representations are parts-based, where complex higher-level parts are defined in terms of less complex lower-level ones.
منابع مشابه
Voice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملIterative Weighted Non-smooth Non-negative Matrix Factorization for Face Recognition
Non-negative Matrix Factorization (NMF) is a part-based image representation method. It comes from the intuitive idea that entire face image can be constructed by combining several parts. In this paper, we propose a framework for face recognition by finding localized, part-based representations, denoted “Iterative weighted non-smooth non-negative matrix factorization” (IWNS-NMF). A new cost fun...
متن کاملSpeech Separation using Non-negative Features and Sparse Non-negative Matrix Factorization
This paper describes a method for separating two speakers in a single channel recording. The separation is performed in a low dimensional feature space optimized to represent speech. For each speaker, an overcomplete basis is estimated using sparse non-negative matrix factorization, and a mixture is separated by mapping the mixture onto the joint bases of the two speakers. The method is evaluat...
متن کاملSupervised vs. unsupervised learning of spectro temporal speech features
To overcome limitations of purely spectral speech features we previously introduced Hierarchical Spectro-Temporal (HIST) features. We could show that a combination of HIST and standard features does reduce recognition errors in clean and in noise. The HIST features consist of two hierarchical layers where the corresponding filter functions are learned in a data driven way. In this paper we inve...
متن کاملDiscovering Convolutive Speech Phones Using Sparseness and Non-negativity
Abstract Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF). Here, we present a convolutive NMF algorithm that includes a sparseness constraint on the activations and has multiplicative updates. In combination w...
متن کامل